Method and system for generation of at least one output analytic for a promotion

ABSTRACT

There is provided a method and system for generating an output analytic for a promotion. The method includes determining, using an optimization machine learning model trained or instantiated with an optimization training set, at least one determined parameter for the promotion which optimizes at least one of received input parameters, the optimization training set comprising received historical data; forecasting, using a promotion forecasting machine learning model trained or instantiated with an forecasting training set, at least one output analytic of the promotion, the prediction training set comprising the received historical data, the at least one received input parameter and the at least one determined parameter; and outputting the at least one output analytic to the user.

TECHNICAL FIELD

The following relates generally to marketing materials and morespecifically to a method and system for generation of at least oneoutput analytic for a promotion.

BACKGROUND

A common tactic for businesses to gain greater awareness of theirproducts or services, and to attempt to spur positive business results,is to undertake promotions. Promotions can take a number of differentforms, such as the generation and distribution of promotional materials,discounts, giveaways, coupons, competitions, or the like. In particular,promotional materials can include, for example, flyers, brochures,leaflets, inserts, admail, or the like. These materials can be in theform of physical hard copies or digital soft copies. In some cases, suchas with flyers, the promotional materials can be used to highlight toconsumers certain products, such as those products which are on sale orthose products which have noteworthy aspects.

Conventionally, in order to create a promotion, personnel of a businessor agency would ad-hoc use their best judgment deciding the macro-goalof the promotion and deciding the various aspects of the promotion toachieve such goal. Such an approach is typically time-consuming,inconsistent, and may have to rely on outside help, such as suppliers,who may have different interests. Thus, the conventional approach topromotions is generally not optimized because of its holistic approachand because it is not particularly systematic or statisticallyselective.

SUMMARY

In an aspect, there is provided a computer-implemented method forgeneration of at least one output analytic for a promotion, the methodcomprising: receiving historical data related to one or more productsand a plurality of previous promotions; receiving at least one receivedinput parameter for the promotion from a user, at least one of the inputparameters comprising a macroscopic objective of the promotion;determining, using an optimization machine learning model trained orinstantiated with an optimization training set, at least one determinedparameter for the promotion which optimizes at least one of the receivedinput parameters, the optimization training set comprising the receivedhistorical data; forecasting, using a promotion forecasting machinelearning model trained or instantiated with an forecasting training set,at least one output analytic of the promotion, the prediction trainingset comprising the received historical data, the at least one receivedinput parameter and the at least one determined parameter; andoutputting the at least one output analytic to the user.

In a particular case, the promotion forecasting machine learning modelcomprises at least one of an average price model and a regression model,the average price model comprises a Random Forest model to predict anaverage effective discounted price of the promotion based on a categoryof products, the regression model incorporating covariates to predictdemand.

In another case, the regression model is used to determine thediscounted price prediction on a per-product basis on a group ofproducts in the same brand or subcategory, or both.

In yet another case, the regression model incorporates indicatorvariables for the one or more products, determining the indicatorvariables comprising, for each product, normalizing absolute units by amean for periods with no promotion, and where such mean is notavailable, normalizing by the mean of the product's entire history.

In yet another case, the promotion forecasting machine learning modelcomprises a first Ridge Regression model combined with a second RidgeRegression model, the first Ride Regression model comprising at leastone training set feature different than the second Ridge Regressionmodel.

In yet another case, the historical data comprises one or products in asimilar category or brand.

In yet another case, the plurality of previous promotions are aggregatedin a stacked relationship.

In yet another case, the historical data comprises transaction historyfor the product and one or more other products in the same productcategory, the transaction history comprising at least one of date sold,product, units sold, price sold.

In yet another case, the at least one output analytic comprises one ofpromotion lift, cannibalization, halo effect, pull forward, and priceelasticity of demand.

In yet another case, the method further comprising determining aconfidence indicator to indicate the reliability of the forecast,determining the confidence indicator comprises: determining if theforecast is in a predetermined scope; and determining, using an accuracymachine learning model trained or instantiated with an accuracy trainingset, the confidence indicator, the accuracy training set comprisingprevious forecasts and their respective actualized values.

In another aspect, there is provided a system for generation of at leastone output analytic for a promotion, the system comprising one or moreprocessors and a data storage device, the one or more processorsconfigured to execute: an input module to receive historical datarelated to one or more products and a plurality of previous promotions,the input module further receiving at least one received input parameterfor the promotion from a user, at least one of the input parameterscomprising a macroscopic objective of the promotion; a machine learningmodule to build an optimization machine learning model trained orinstantiated with an optimization training set and, using theoptimization machine learning model, determine at least one determinedparameter for the promotion which optimizes at least one of the receivedinput parameters, the optimization training set comprising the receivedhistorical data, the machine learning module further building apromotion forecasting machine learning model trained or instantiatedwith an forecasting training set and, using the promotion forecastingmachine learning model, forecasting at least one output analytic of thepromotion, the at least one received input parameter and the at leastone determined parameter; and an output module to output the at leastone output analytic to the user.

In a particular case, the promotion forecasting machine learning modelcomprises at least one of an average price model and a regression model,the average price model comprises a Random Forest model to predict anaverage effective discounted price of the promotion based on a categoryof products, the regression model incorporating covariates to predictdemand.

In another case, the regression model is used to determine thediscounted price prediction on a per-product basis on a group ofproducts in the same brand or subcategory, or both.

In yet another case, the regression model incorporates indicatorvariables for the one or more products, the machine learning moduledetermines the indicator variables by, for each product, normalizingabsolute units by a mean for periods with no promotion, and where suchmean is not available, normalizing by the mean of the product's entirehistory.

In yet another case, the promotion forecasting machine learning modelcomprises a first Ridge Regression model combined with a second RidgeRegression model, the first Ride Regression model comprising at leastone training set feature different than the second Ridge Regressionmodel.

In yet another case, the one or more processors further configured toexecute a confidence module to determine a confidence indicator, theconfidence indicator indicates the reliability of the forecast, theconfidence module determines the confidence indicator by: determining ifthe forecast is in a predetermined scope; and determining, using anaccuracy machine learning model trained or instantiated with an accuracytraining set, the confidence indicator, the accuracy training setcomprising previous forecasts and their respective actualized values.

In another aspect, there is provided a computer-implemented method forgeneration of at least one output analytic for promotional materials,the method comprising: receiving historical data related to one or moreproducts and a plurality of previous promotional materials; receivingone or more input parameters related to the promotional materials from auser; selecting, using a machine learning model trained or instantiatedwith a selection training set, a configuration and a layout for the oneor more products on the promotional materials, the selection trainingset comprising the historical data and the one or more input parameters,the selection comprising: assigning a prominence weight to each of theone or more products; normalizing the prominence weight for each of theone or more products; determine a block structure for the promotionalmaterials based on the prominence weight of each of the one or moreproducts; and determine a location for each of the products on thepromotional materials based on the prominence weight of each of the oneor more products; and outputting the promotional materials based on theselection of the configuration and layout.

In a particular case, the method further comprising selecting, using theselection machine learning model, the one or more products to bepromoted on the promotional materials.

In another aspect, there is provided a computer-implemented method forgeneration of at least one output analytic for per-store unit demand,the method comprising: receiving historical data related to one or moreproducts, the historical data comprising historical inventory level ofthe one or more products at a retail store; forecasting, using a demandmachine learning model trained or instantiated with a demand trainingset, a demand for the one or more products at the retail store, thedemand machine learning model comprising a first model for predictingthe total unit demand for the retail store and a second model forpredicting the demand in the retail store for the one or more products,the demand training set comprising the historical data, the forecastcomprising multiplying the prediction of the total unit demand for theretail store for a predetermined time-period by the prediction of thedemand in the retail store for the one or more products; and outputtingthe at least one output analytic to the user.

In a particular case, the forecast further comprises adding a covariatefor a stock out condition.

These and other aspects are contemplated and described herein. It willbe appreciated that the foregoing summary sets out representativeaspects of systems and methods to assist skilled readers inunderstanding the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

The features of the invention will become more apparent in the followingdetailed description in which reference is made to the appended drawingswherein:

FIG. 1 is a schematic diagram of a system for promotion optimization inaccordance with an embodiment;

FIG. 2 is a schematic diagram showing the system of FIG. 1 and anexemplary operating environment;

FIG. 3 is a flow chart of a method for promotion optimization inaccordance with an embodiment;

FIG. 4 is an exemplary promotion forecast generated by the system ofFIG. 1;

FIG. 5 is another exemplary promotion forecast generated by the systemof FIG. 1;

FIG. 6 is a schematic diagram of a system for optimizing promotionalmaterials in accordance with an embodiment;

FIG. 7 is a schematic diagram showing the system of FIG. 6 and anexemplary operating environment;

FIG. 8 is a flow chart of a method for optimizing promotional materialsin accordance with an embodiment;

FIG. 9 illustrates an exemplary advertising material generated by thesystem of FIG. 6; and

FIG. 10 is a flow chart of a method for generation of at least oneoutput analytic for per-store unit demand, according to an embodiment.

DETAILED DESCRIPTION

Embodiments will now be described with reference to the figures. Forsimplicity and clarity of illustration, where considered appropriate,reference numerals may be repeated among the Figures to indicatecorresponding or analogous elements. In addition, numerous specificdetails are set forth in order to provide a thorough understanding ofthe embodiments described herein. However, it will be understood bythose of ordinary skill in the art that the embodiments described hereinmay be practiced without these specific details. In other instances,well-known methods, procedures and components have not been described indetail so as not to obscure the embodiments described herein. Also, thedescription is not to be considered as limiting the scope of theembodiments described herein.

Various terms used throughout the present description may be read andunderstood as follows, unless the context indicates otherwise: “or” asused throughout is inclusive, as though written “and/or”; singulararticles and pronouns as used throughout include their plural forms, andvice versa; similarly, gendered pronouns include their counterpartpronouns so that pronouns should not be understood as limiting anythingdescribed herein to use, implementation, performance, etc. by a singlegender; “exemplary” should be understood as “illustrative” or“exemplifying” and not necessarily as “preferred” over otherembodiments. Further definitions for terms may be set out herein; thesemay apply to prior and subsequent instances of those terms, as will beunderstood from a reading of the present description.

Any module, unit, component, server, computer, terminal, engine ordevice exemplified herein that executes instructions may include orotherwise have access to computer readable media such as storage media,computer storage media, or data storage devices (removable and/ornon-removable) such as, for example, magnetic disks, optical disks, ortape. Computer storage media may include volatile and non-volatile,removable and non-removable media implemented in any method ortechnology for storage of information, such as computer readableinstructions, data structures, program modules, or other data. Examplesof computer storage media include RAM, ROM, EEPROM, flash memory orother memory technology, CD-ROM, digital versatile disks (DVD) or otheroptical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed by anapplication, module, or both. Any such computer storage media may bepart of the device or accessible or connectable thereto. Further, unlessthe context clearly indicates otherwise, any processor or controller setout herein may be implemented as a singular processor or as a pluralityof processors. The plurality of processors may be arrayed ordistributed, and any processing function referred to herein may becarried out by one or by a plurality of processors, even though a singleprocessor may be exemplified. Any method, application or module hereindescribed may be implemented using computer readable/executableinstructions that may be stored or otherwise held by such computerreadable media and executed by the one or more processors.

The following relates generally to a method and system for generation ofat least one output analytic for a promotion. The method and systemprovide a technological approach for analyzing historical data in orderto automatically solve the technical problem of determining optimizedaspects of a promotion using historical data. In some cases, predictionscan be generated with respect to the aspects of the promotion.

The promotion can include any suitable marketing or publicizing of oneor more products. Product, as used herein, is understood to includegoods, services, or any suitable offering by a company or person; andcan include, for example, any merchandise, service, venture, event,subscription, or donation opportunity that is offered, promoted, sold,or otherwise advertised to consumers, businesses, or the general public.Product can also include a product group, subcategory of products, orany other collection of products. Store, as generally used herein, caninclude any suitable establishment or interface in which goods orservices can be sold or provided, including those provided online.

In a particular embodiment, as described herein, the promotion caninclude generation of promotional materials. Promotional materials caninclude, for example, flyers, brochures, leaflets, inserts, admail, orthe like. The promotional materials referred to herein can be in theform of physical hard copies or digital soft copies.

Referring now to FIG. 1, a system 10 for generation of promotionanalytics, in accordance with an embodiment, is shown. In thisembodiment, the system 10 is run on a server. In further embodiments,the system 10 can be run on any other computing device; for example, adesktop computer, a laptop computer, a smartphone, a tablet computer, apoint-of-sale (“PoS”) device, a smartwatch, or the like. The system 10enables the determination of analytical data features of a promotion.

FIG. 1 further shows various physical and logical components of anembodiment of the system 10. As shown, the system 10 has a number ofphysical and logical components, including a central processing unit(“CPU”) 12, random access memory (“RAM”) 14, an input interface 16, anoutput interface 18, a network interface 20, non-volatile storage 22,and a local bus 13 enabling CPU 12 to communicate with the othercomponents. CPU 12 executes an operating system, and various modules, asdescribed below in greater detail. RAM 14 provides relatively responsivevolatile storage to CPU 12. The input interface 16 enables anadministrator or user to provide input via an input device, for examplea keyboard and mouse. The output interface 18 outputs information tooutput devices, such as a display and/or speakers. The network interface20 permits communication with other systems, such as other computingdevices and servers. Non-volatile storage 22 stores the operating systemand programs, including computer-executable instructions forimplementing the operating system and modules, as well as any data usedby these services. Additional stored data, as described below, can bestored in a database 24. During operation of the system 10, theoperating system, the modules, and the related data may be retrievedfrom the non-volatile storage 22 and placed in RAM 14 to facilitateexecution. In an embodiment, the system 10 further includes an inputmodule 26, an output module 28, and a machine learning module 30.

In some cases, as shown in a diagram of a computing environment 90 inFIG. 2, the system 10 can communicate with, and retrieve data, fromother computing devices; for example, a laptop computer 38 and a server36. The system can communicate with these devices over a datacommunication network; for example, the Internet 34.

Promotions can come in different types and be directed to differenttargets. “Mass promotions” can be directed to the general public; forexample, a promotion for milk for $1.99 to anyone who walks in to astore. “Direct promotions” can be directed and sent to a specific personor group's contact; for example, via email, short message service,traditional mail, apps, or the like. “Select promotions” can be directedto persons who are part of a selected group; for example, promotionsonly available to individuals who have a loyalty card.

A common problem with developing promotions is determining optimalanalytical aspects of the promotion and using such analytics to forecastdemand. As an example, given a product (identified or referred to by itsstock keeping unit (SKU)), its history of promotions and transactions,and any additional relevant causal factors, trying to predict the futuredemand of the SKU for a given promotion mechanic.

Turning to FIG. 3, a flowchart is shown for a method 50 for generationof promotion analytics. At block 51, historical data related to one ormore products and a plurality of previous promotions is received by theinput module 26. At block 52, at least one input parameter of thepromotion, as described below, is received from the user via the inputinterface 16 to the input module 26. In most cases, the input parameterreceived from the user will reflect the macroscopic objective of thepromotion; for example, product uplift, product sales target, or productdemand targets. In some cases, the input parameter received from theuser can also include input parameters that constrain the promotion;such as, transaction history, which product is to be promoted, how theproduct is to be promoted, or other limitations.

At block 54, the machine learning module 30 passes the received inputparameters through a machine learning model, as described herein. Insome cases, the machine learning model can use time series approachesthat primarily use historical data as basis for analytically estimatingfuture behavior. Time series approaches can include, for example,ARIMAX, AR, Moving Average, Exponential smoothing, or the like. Infurther cases, the machine learning model can use regression basedapproaches that use a variety of factors (including past data points) topredict future outcomes with an implicit concept of time (through thedata points). Regression based approaches can include, for example,linear regression, random forest, neural network, or the like.

At block 56, in most cases, the input module 26 determines, via themachine learning module 30, at least one other input parameter of thepromotion. The input parameters, as described herein, can include, forexample, product selection, promotion mechanics, time period forpromotion, other causal factors, or the like. The input module 26determines values for the other input parameters which optimize theforecast based on the constraints of the received input parameters.

The historical data can include transaction history, for example,historical data relating to the product. The historical data caninclude, for example, date sold, product, units sold, price sold, or thelike. In some cases, the transaction history can also include historicaldata relating to similar products, for example, products in the sameproduct category.

The promotion mechanics can include the device or arrangement in whichthe product is promoted. For example, a promotion whereby the customercan buy two of the products and get one free, lasting from April 1^(st)to April 14^(th). In another example, a promotion whereby the product'sprice is reduced 50%, lasting from December 1^(st) to December 25^(th).

The other causal factors can include, for example, budget, vendorsubsidy, seasonality, distribution, appearance, stock, SKU age, starbuy, feature promotion, points promotion, shelving, or the like.Seasonality can include, for example, the time of year of the promotion,whether it is around the holiday season, or the like. Distribution caninclude, for example, which stores receive the promotion, the quantityof stores that receive the promotion, which areas of the country receivethe promotion, or the like. Appearance can include, for example, whichshelves the product is displayed in a retail environment (such as onend-cap shelving), whether there is additional signage around theproduct, or the like. Stock can include, for example, if the promotionis for a limited run of stock of the product, the stockout percentage,or the like. SKU age can include, for example, the time since first saleof the product. Star buy can include, for example, an enhanced specialpromotion, for a very limited time, in which the product is heavilypromoted, in addition to a steep discount in price. Feature promotioncan include, for example, a promotion which does not include a pricechange but is more heavily featured in the store or in advertising.Points promotion can include, for example, providing additional pointsfor purchase of the product in a loyalty program. Shelving can include,for example, selecting featured shelving for the product or providingthe product on secondary shelves where the product is not typicallylocated.

At block 58, output analytics related to the promotion are determined bythe output module 28 via the machine learning module 30. The outputanalytics can include, for example, forecasted demand, forecasted price,baseline demand, baseline price, forecast without promotion mechanics,inventory forecasting (such as at a warehouse or store level), or thelike. Forcasted demand can be a determination of the number of units ofthe product projected to sell for a given period while the product is onpromotion; in some cases, this can be analyzed for each given promotionmechanic. In some cases, this forecast can be at a per-store level,per-distribution centre level, or the like. Forcasted price can be ananalysis of the forecasted average price per unit of the product whilethe product is on promotion. Baseline demand can be a determination ofthe number of units projected to sell for the given period while theproduct is on promotion if the product was to have not been put onpromotion. Baseline price can be a determination of the average priceper unit of the product if the product was to have not been put onpromotion.

At block 60, other secondary analytics can be derived from the outputanalytics by the output module 28. For example: deriving uplift as aresult of the promotion; deriving cannibalization as a result of thepromotion; deriving halo effect as a result of the promotion; derivingbasket penetration as a result of the promotion (distinct number oftransactions that a product has been in for a given time period);deriving residual basket value (the average basket size when product issold, minus the product); and the like.

In some cases, the user, via the input interface 16, can adjust theinput parameters in the input module 26 after a forecast has beenprovided, in order to determine which input parameters arrive at aforecast for a desired output.

The machine learning module 30 can use machine learning techniques withthe machine learning model (called a promotion forecasting model) toforecast output analytics. The promotion forecasting model can betrained using input parameters related to past promotions. In furthercases, the promotion forecasting model can be instantiated with data,such as transaction history, provided by a user.

In some cases, multiple promotion forecasting models can be used suchthat their results may be averaged or weighted accordingly.

In a particular case, the machine learning module 30 can use aregression model to determine forecasted lift as a result of apromotion. Since there may not be a control group for the prediction,the incremental lift can be estimated in terms of sales or units sold.For example:

-   log(daily_sales_(promo(dow)))=β₁    log(daily_sales_(pre(dow)))+β₂(promos_(promo(dow))−promo_(pre(dow)))+β₃I(Mon_(promo(dow)))+    . . . +β₉I(Sun_(promo(dow)))

Where:

-   -   daily sales_(promo(dow)): is the average daily sales for a given        product group for that particular day of the week in the        promotion period;    -   daily sales_(pre): is the average daily sales for a given        product group for that day of the week in a preceding period        prior to promotion;    -   promos_(P(dow)): is the average number of promotions for the        given product group on that day of the week for the given period        P; and    -   I( ): is the indicator function (1/0) depending on whether day t        is the given day of the week.

In the above case, regression is performed on a daily basis to accountfor the fact that many promotions do not align with weekly boundaries.In further cases, other time periods may be used. Additionally, theabove case treats each promotion equally (without regard to the type)and considers the number of promotions (being the difference in thenumber of promotions between the preceding and promotion periods).

In further cases, an L2 regularized regression (ridge regression) can beused to achieve an intended robust result. L2 regularization in the caseof ordinary linear regression can include putting a Gaussian prior onthe coefficients. Without the regularization, Applicant determined thatthe coefficients may not produce reasonable results as sometimes theindicator coefficients can have too much weight associated with them.With regularization, Applicant determined that estimates appear to bemore objectionably reasonable.

Once the regression model is determined by the machine learning module30, intermediates values can be determined, including:

-   -   sales_(actual): the prediction of the actual sales during the        promotion period produced by the model. This may require to        summing over the day of week values predicted from the model. It        should thus be relatively close to the actual sales during the        promotion period.    -   sales_(no_promo): the total estimated sales as predicted by the        model, but with subtracting one from promos_(promo(dow)). As        such, this can estimate what the sales would have been if the        current promotion did not run.

The promotion lift can then be determined by the machine learning module30 as:

${PromotionLift} = {{sales}_{actual}\frac{{sa}\hat{l}{es}_{{no}_{p}{romo}}}{{sa}\hat{l}{es}_{actual}}}$

The above approach can be used instead of a direct computation of the‘actual sales’ minus ‘sales with no promotion’ because the estimate forno promotion is generally on a different scale than the actual sales.Specifically, if the estimate of sales_(actual) is off fromsales_(actual) then this bias will be included in the estimate ofpromotion lift. Instead, the relative change is extracted in ‘actual vs.no promotion’, which is predicted by the model, and uses this to adjustthe actual sales.

In another particular case, price elasticity analytics can be forecastedas a measure of the sensitivity of unit sales to changes in price. Inthis example, the model for price elasticity that is used is themultiplicative model. The multiplicative model can model demand usingthe following regression model:

-   log(Q_(i,t))=η log(P_(i,t))+β₁promos_(i,t)+β₂I(Jan(t))+ . . .    +β₁₃I(Dec(t))+β₁₄I(store_format₀(i))+ . . .    +β_(14+k−1)I(store_format_(k−1)(i))

Where:

-   -   Q_(i,t): is the normalized units for week t basis for store        format i (normalized by, for example, average weekly unit        sales);    -   P_(i,t): is the normalized average price for week t basis for        store format i (normalized by, for example, average weekly        price);    -   I(Jan(t)); . . . ; I(Dec(t)): are the set of indicator variables        (1/0) if t falls in that month;    -   I(store_format₀(i)); . . . ; I(store format_(k−1)(i)): are the        set of indicator variables (1/0) store i is of the given store        format; and    -   η is the price elasticity.

η is the price elasticity due to the definition of point priceelasticity being:

$E_{d}:={\frac{P}{Q}\frac{dQ}{dP}}$ log  Q = η log  P+  …$\frac{d\;\log\; Q}{dP} = \frac{d\left( {{\eta\;\ln\; P} + \mspace{14mu}\ldots} \right)}{dP}$${\frac{1}{Q}\frac{dQ}{dP}} = \frac{\eta}{P}$$\eta = {{\frac{P}{Q}\frac{dQ}{dP}} = E_{d}}$

Similar to above, L2 regularized regression (ridge regression) can beused to get an intended robust result.

E_(d) can be interpreted as:

-   -   E_(d)=0: Perfectly inelastic demand;    -   −1<E_(d)<0: Inelastic or relatively inelastic demand;    -   E_(d)=−1: Unit elastic demand (change in one unit of price,        results in one unit of quantity);    -   −inf<E_(d)<−1: elastic/relatively demand; and    -   E_(d)>0: inverse demand relationship (demand increases with        price).

In another particular case, the promotion forecasting model can use atleast one of an average price model and a regression model. The averageprice model can predict effective average price using promotionmechanics. Effective average price is understood to mean that anobserved average price (total sales divided by total units). This pricemay not always match the promotion price due to various factors such asa promotion requiring a trigger (for example, buy 2 get 1 free), or dueto small differences in the stores the product is on sale. Theregression model can predict demand using covariates such as, forexample, average price from the average price model, relevant additionalpromotion mechanics, and any other relevant causal factors.

A reason for the double-pronged promotion forecasting model can betwo-fold. First, price (or effective price) is generally the primarydriver of sales; so translating promotion mechanics into an averageprice can work particularly well. Second, causal factors generallyaffect the promotional demand more than trend behavior. Applicantdetermined this is due to because when a promotion occurs, the promotionis typically a step function (not on sale to on sale). Additionally,relevant seasonal terms or long term trends can be easily encoded withina regression problem.

In an example of system 10, data can be aggregated for each week on atotal-store basis. In this way, a feature table can be indexed by SKU,week, and the like, with a real number for each of the resulting featurecolumns.

In an example, there are various types of promotion mechanics that maybe used by the system 10. For instance, there can be a direct pricediscount, such as price changes from $100 to $80. There can be unitincentive, such as, buy 2 units and get 1 unit free. There can be aquantity discount, such as buy 5 units for $10 each instead of a typicalprice of $12 per unit. There can be a percentage quantity discount, suchas buy 5 units and get 25% off the purchase price.

In further examples, there may be more complex types of promotionmechanics. For instance, there can be group promotions, such as if acustomer buys 3 units of any of products A, B, or C, they can get 1 unitfree of any of products A, B, or C. There can be gift promotions, suchas if a customer buys 3 units of any of products A, B, or C, they canget product D for free as a gift. There can be member promotions, suchas a customer can get a discounted price on product A if they are amember of the store's mailing list.

In further cases, any combination of promotion mechanics may be stackedtogether as a combined promotion.

The machine learning module 30 can use raw promotion data to encode eachone of the promotion mechanic types into several columns. For example,for a “Buy 2 units, get 1 unit free” promotion, the machine learningmodule 30 can use the following columns to encode it:

-   -   buy condition=units;    -   threshold=3;    -   reward=1;    -   reward type=free item; and    -   item=SKU.

In general, one column (e.g. buy condition) can determine theinterpretation of another column (e.g. threshold). To resolve thisissue, a set of columns for each of the different types of promotionscan be created. For the above example (type 001, non-group promotion),there can be:

-   -   001_threshold_unit_non_group—real value if threshold for        purchase is in units promotion subtype;    -   001_threshold_value_non_group—real value if threshold for        purchase is in dollars promotion subtype;    -   001_reward_unit_non_group—indicator if reward is in units        promotion subtype;    -   001_reward_value_non_group—indicator if reward is in units        promotion subtype;    -   001_percentage_discount_unit_non_group—computed relative        discount if units promotion subtype; and    -   001_percentage_discount_unit_non_group—computed relative        discount if units promotion subtype.

The machine learning module 30 can similarly repeat the column creationfor other types of promotions. An advantage of this type of encodingscheme is that such scheme can deal with stacking promotions ofdifferent types on the same SKU, at the same time; since it is generallynot an issue if different columns for different types of promotionsoverlap. If two promotions of the same type are stacked, the machinelearning module 30 can aggregate the duplicated rows (e.g. take anaverage), which, in some cases, works out in the regression (half-waybetween the two promotions).

In further cases, the machine learning module 30 can use other featuresin the promotion forecasting model.

The average price model can be used to approximately predict the averageeffective discounted price (called “avg price”) as an outcome variablefrom the promotion mechanics. The prediction can use a machine learningmodel, such as a Random Forest model, such that: avgprice=RandomForest(promotion_mechanics). In some cases, the predictioncan be determined for each category separately.

Generally, explicitly mapping promotion mechanics to a predictedpromotion price is complicated by the fact that there can be a copiousnumber of fields. Accordingly, the machine learning module 30 can use amachine learning model that can effectively learn this mapping function,while being able to interpolate between unseen corner cases; so long asthe given category has enough training data. Each row of the trainingdata can be formulated as:

-   -   avg price: average price of SKU divided by regular price (value        between 0.0 and 1.0) for a given time period (outcome variable);        and    -   promotion mechanics: as described herein, SKUs with stacked        promotions are averaged to produce a single row for each (SKU,        time period) pair.

The test data is formulated by taking the promotion mechanics for thetarget time period to get a predicted avg price of between 0.0 and 1.0of the regular price (non-promotion price). To determine the predictedaverage price, the raw model output can be multiplied by the regularprice. Since the output is with respect to the regular price, data fromany number of different priced SKUs can be pooled, which can partiallysolve the data sparsity problem for any single SKU. In some cases, thereis some randomness (i.e. non-deterministic) when translating promotionmechanics to average price because some mechanics need to be triggered;for example, in a buy two and get one free promotion mechanic, somepeople will only buy one and thus never trigger the promotion.

Once the average price model has been fit on an entire category, in somecases, the regression promotion forecasting model can be used by themachine learning module 30 to train models. The regression promotionforecasting model can be trained both on a per SKU and, in some cases,on a per brand or subcategory level to predict demand. The latter can beused to determine predictions on a per-SKU level, but trained on a groupof SKUs within the same brand or subcategory. This can be useful to, forexample, help fill in missing data and deal with data sparsity. Usingeach model, a set of features can be inputted and a prediction of theunit demand for a given set of features can be outputted, as in:

-   -   units=Model(avg price,feature₁; . . . feature_(k))

In some cases, an ensemble approach can be used by the machine learningmodule 30 because, as determined by the Applicant, such approach tendsto be the most empirically accurate. Applicant has determined that somemodels do extremely well on many cases but falter on individual cornercases. As such, an ensemble approach can be used to mitigate thisproblem. In an example, an ensemble approach can include any one or moreof:

-   -   training multiple models on the same SKU (or a group of SKUs in        the same brand or category, for each training, the covariates or        features, as well as the model type, are varied (the model type        can be, for example, RidgeRegression, Random Forest, or the        like);    -   determining a prediction using each of the trained models; and    -   determining a median value of all the predictions.

Any one SKU, even those with large transaction histories, typically willnot have had the entire breadth of promotion mechanics in its history.Thus, in some cases, it can be advantageous to be able to pool data frommultiple similar SKUs. In this way, effects can be estimated for atarget SKU, even though that SKU may not have had those effects directlyobserved for it. However, there may be some complications of thisapproach, for example:

-   -   the units outcome variable can have largely different magnitudes        for different SKUs, even for SKUs in the same subcategory or        brand;    -   similarly, the price outcome variable can be significantly        different between different SKUs; and    -   different SKUs may respond noticeably dissimilar to different        promotion mechanics.

The Applicant determined that the above complications for pooling datacan be overcome by normalizing the units and average price within asubcategory or brand such that the data may be reliably pooled together.In some cases, depending on the model, indicator variables can also beincorporated into the model for each SKU. For example, for each SKU,normalizing absolute units by a mean for non-promotion periods, and ifsuch mean is not available, then normalizing by the mean of the SKU'sentire history. When determining a prediction, the raw normalized modeloutput data is multiplied by this scaling factor to get the predictionin terms of units. In some cases, average price is already normalized asdescribed in the average price model. However, the predicted averageprice can be un-normalized by multiplying the average price by theregular price.

With respect to the Random Forest model employed by the machine learningmodule 30, in some cases, there can be two different Random Forestmodels with different covariates. In a first model, called “Model A”,having features of, for example, avg price, SKU age, star buy, featurepromotion, store segment, points promotion, and the like. In this case,the model can be trained on a per-SKU basis. In another model, called“Model B”, there can be features of, for example, avg price, number ofstores, number of active promotions, star buy, shelving, store segment,feature promotion, seasonality, points promotion, and the like. In thiscase, the model can be trained on a pooled SKU basis, such as across asubcategory or brand. In some cases, where there is training acrossmultiple SKUs, there may be an indicator variable for each SKU.

Due to the nature of Random Forest models, these models can be better atinterpolating rather than extrapolating data. Accordingly, these modelsare useful for “memorizing” past promotion information. These models areparticularly advantageous when there is a lot of previous data (forexample, many different price points) and a forecasted predictionresembles past behavior.

In a particular embodiment, the machine learning module 30 can use aRidge Regression model as a base model. Generally, Ridge Regression canbe interpreted as a simple linear regression with a zero-mean normallydistributed Bayesian Prior. The machine learning module 30 can extendthis model to handle certain additional situations.

First, the machine learning module 30 can “cap” the forecast at eithera) mean+3*standard deviation, or b) a previous maximum. This cap may beneeded because, in some cases, the regression estimates will extrapolatea demand that is a complete outlier. This cap can handle such a cornercase.

Second, the machine learning module 30 can impose a non-zero meanBayesian Prior that can help fill in missing or sparse data when makingan estimate with respect to a single SKU. In some cases, the machinelearning module 30 can implement a MAP estimation for a Bayesian priorusing a Ridge Regression with Bayesian Prior.

Fitting a regression model on the pool of SKUs can advantageouslyprovide coverage for all, or a significant portion, of possiblecovariates. In some cases, the coefficients from this model become themean of the Bayesian Prior. The machine learning module 30 can use suchcoefficients as Bayesian Priors on per-SKU regression models. In caseswhere a SKU-level model has enough data to “override” the BayesianPrior, then the estimate will typically be the same as without theBayesian Prior. Otherwise, the Bayesian Prior can be used to fill themissing coefficient. The above approach can be viewed as an empiricalBayes approach. Such approach is made possible because of thenormalization described above due to fitting the pooled SKUs model.

In a particular case, the machine learning module 30 can use two RidgeRegression models. In a first model, called “Model A”, having featuresof, for example, avg price, SKU age, star buy, feature promotion, pointspromotion, and the like. In a second model, called “Model B”, havingfeatures of, for example, avg price, stock, seasonality, store segment,shelving, star buy, feature promotion, points promotion, and the like.Both Model A and Model B can be trained by pooled SKUs based onsubcategory or brand. In a particular case, an outcome variable can bethe log(units); which can advantageously model the relationship withprice more accurately because the relationship is typically a non-linearrelationship between price and demand.

The ensemble approach can advantageously provide better accuracy whencompared to other singular models. As an example, a simpler variant ofModel A comprising just avg price and SKU age has shown suitable resultsoverall because most items were driven by avg price, with a trenddownward as the SKU aged. However, the simpler variant of Model A didnot cover all cases, particularly where other covariates have a greatercontribution.

In a particular embodiment of system 10, the system 10 can include aconfidence module 40. For each one of the forecasts determined by themachine learning module 30, the confidence module 40 can provide aconfidence indicator to indicate the reliability of such forecast. Theconfidence indicator can be, for example, a score (such as a score outof 10), a percentage, a colour marker (such as green, yellow, and red),or the like. The confidence module 40 determines the confidenceindicator using an exemplary two-step process.

The confidence module 40 determines if each SKU is in scope, where outof scope SKUs are automatically marked as such. Determining if the SKUis in scope includes using business rules that determine whether aspectsof a product are acceptable; such as determining whether the product issupported and not discontinued.

The confidence module 40 then uses a confidence machine learning modelthat is trained on previous SKU forecasts to determine the accuracy ofprevious SKU forecasts versus their actualized values. In an example,the confidence model can have the form:

-   -   forecast error=Regression Model(SKU features)forecast        error:=log(1+((RK forecast units)/(actual units+1)))

In this case, the outcome variable is just the logarithm of the ratio ofthe units. The “+1”s are provided to avoid corner cases, such asdividing by zero and log(0). The data used to train the confidence modelis taken from previous promotions. Examples of features that can be usedwith the confidence model include: mean or standard deviation of aproduct's price; units; stockouts; number of promotions; number ofstores at different time periods; and the like. The confidence module 40can use several ratios of the above features. The confidence module 40can also use other indicator features for sub-category, product status,or the like.

In some cases, the confidence module 40 can be used for cutting offforecasts if the confidence in the prediction is too low; for example,where the forecast is below a given confidence score.

In further cases, confidence can be determined by the confidence module40 using other metrics, for example, model fit metrics, mean absolutepercentage error, or the like.

In some embodiments, as shown in the method 1000 of FIG. 10, the machinelearning module 30 can determine a per-store unit demand forecast model.This type of forecast can be advantageous because when orderinginventory, supply chain managers typically require a per-store unitdemand forecast. At block 1002, historical data related to one or moreproducts is received, the historical data comprising historicalinventory level of the one or more products at a retail store. At block1004, the machine learning module 30 builds a machine learning model, asdescribed herein, to forecast a ratio for total store level unit demand.At block 1006, the machine learning module 30 also builds a machinelearning model to predict the total store demand for a particularproduct. At block 1008, the machine learning module 30 can determine aper-store SKU forecast by multiplying the total store level unit demandby the total store level SKU forecast. This forecast analytic can thenbe outputted at block 1010. This approach can be advantageous becauseeach individual store may not sell many units in any given time period.This makes it very difficult to predict a given store's unit demand,whereas at a total store level, the demand is much more predictable.

The per-store unit demand forecast model makes use of the idea that fora given SKU, take, for example, a 4 week average unit sales per store,and use those proportions to multiply by the total-store forecast. In anexample, this model (per SKU) can be formalized as:

$\mspace{20mu}{\text{?} = {{\text{?}\mspace{14mu}{proportion}\mspace{14mu}{for}\mspace{14mu}{store}\mspace{14mu} i} = \frac{\text{?}}{\sum\limits_{i = 1}^{n}\text{?}}}}$?indicates text missing or illegible when filed

where y_(i) is the unit demand for store i.

In an example, the above equation is equivalent to a 4 week averageassuming that the model is only trained with data representing the past4 weeks of unit demand. This is due to the constant coefficient beingthe average of the 4 week demand, as the proportion is the relative unitdemand of this average.

In further cases, there may be variations of the per-store unit demandforecast model. A first variation can be adding an additional covariatefor the percent number of days stock out (per store). The reasoningbehind this variation is that if a store stocks out of an item, theabove equation will under stock the item, causing the store to stock outof the product. This can cause the store to then under or over order,possibly leading to a cycle of out of stock situations. In thisvariation, by adding a covariate for stock out, some of these situationscan be absorbed to ensure that the forecasted average is closer to theactual demand. This variation can be formalized as:

$\mspace{20mu}{\text{?} = {{{\text{?}{stockout}_{i}} + {\text{?}\mspace{14mu}{proportion}\mspace{14mu}{for}\mspace{14mu}{store}\mspace{14mu} i}} = \frac{\text{?}}{\sum\limits_{i = 1}^{n}\text{?}}}}$?indicates text missing or illegible when filed

In another variation, the above equation can be used, except thatinstead of using raw units, proportions for a given week as used. Theproportions are called r_(i) for a given store i. In this case, thetraining points can be varied to any suitable time period. Thisvariation can be formalized as:

$\mspace{79mu}{\text{?} = {{{\text{?}\text{?}} + {\text{?}{proportion}\mspace{14mu}{for}\mspace{14mu}{store}\mspace{14mu} i}} = \frac{r_{i}}{\sum\limits_{i = 1}^{n}r_{i}}}}$?indicates text missing or illegible when filed

In some cases, the proportion of outputs for each store can be adjustedby a range of view. Assuming, there is an indicator for the range ofview called v_(i), which is 1 if store i is in range of view for thegiven time period, and 0 otherwise. This can be formalized as:

$\mspace{79mu}{{{adjusted}\mspace{14mu}{proportion}\mspace{14mu}{for}\mspace{14mu}{store}\mspace{14mu} i} = \frac{\text{?}\text{?}}{\sum\limits_{i = 1}^{n}{\text{?}\text{?}}}}$?indicates text missing or illegible when filed

Turning to FIG. 4, an example promotion forecast is shown, according toembodiments described herein. The example promotion forecast can beoutput to a user via the output interface 18. In this example, forecastsfor a four-week promotion of a discounted product are provided.

Advantageously, the embodiments of system 10 are intended to providemore accurate forecasts using multiple factors (for example, past sales,trends, price, promo mechanics, and the like). Further, the embodimentsof system 10 are intended to advantageously provide automated andconsistent forecast methodology and confidence indicators for eachforecast. This is intended to provide for a reduction in stockouts andexcess inventory, and prevent negative short term or long term financialimpacts.

In a further embodiment, the machine learning module 30 can be used toprovide insights on previous promotions. Insights can include, forexample, promotion lift, cannibalization, halo effect, pull forward,price elasticity of demand, and the like.

The machine learning module 30 uses the machine learning model toevaluate and measure past promotions. This can be advantageous becauseit can provide a full data driven comprehensive view of past performanceof promotions. In some cases, the evaluation is advantageously based ontotal uplift, taking into account baseline sales, and the negativeeffects of cannibalization, halo, and pull forward. Thus,advantageously, determining uplift not just based on raw sales. Infurther cases, the evaluation can also include: promotion lift as ameasure of incremental promotional lift of the promotion in comparisonto a baseline; price elasticity as a measure of impact of price changesto demand; residual basket value as a measure of average basket sizewhen this product is sold, minus the product; basket penetration as ameasure of the proportion of transactions involving the product; itemimportance as a measure of impact that a product has on a category; andcustomer centric determination as a measure of the effect of thepromotion across different customer groups.

FIG. 5 shows an example output of the evaluation of past promotions, asoutput to a user via the output interface 18.

In exemplary embodiments of the system 10 described herein, as shown inFIG. 6, and as follows, there is provided a system 100 for optimizingpromotional materials. In an embodiment, the system 100 is run on aserver. In further embodiments, the system 100 can be run on any othercomputing device; for example, a desktop computer, a laptop computer, asmartphone, a tablet computer, a point-of-sale (“PoS”) device, asmartwatch, or the like. The system 100 enables the generation ofoptimized promotional materials that can then be delivered to recipientsphysically, for example through the mail, or digitally, for example viae-mail.

As part of determining aspects of a promotion, a company may generatepromotional materials (also called advertisements). Advertisements are aprimary resource for companies to get information of their product(s)out to consumers or other businesses, as the case may be. However, thepromotional area on these materials is often limited. The promotionalarea can be physically limited; for example, a limited amount of spaceon each page and/or a limited amount of pages. The promotional area canalso be limited in that a customer has a limited attention span forreading the promotional materials; for example, if the materials are toolong, too dense or overly comprehensive, the reader will often forgoreading portions of the promotional materials. In order to optimize theeffect of the promotional materials on an intended audience, the system100 can make determinations for a variety of attributes of theadvertising products. As described further herein, the system 100 canmake certain determinations of the attributes of the promotionalmaterials; for example, which products to put on the advertisementmaterial, the distribution of the products on the advertisementmaterial, the space encompassed by promotion of the product, and whataspects of the product to highlight. In an embodiment, the system 100can then generate promotional materials based on the abovedeterminations.

In a further embodiment, the system 100 can make a determination as towho is the intended audience and who should receive the promotionalmaterials.

FIG. 6 further shows various physical and logical components of anembodiment of the system 100. As shown, the system 100 has a number ofphysical and logical components, including a central processing unit(“CPU”) 102, random access memory (“RAM”) 104, an input interface 106,an output interface 108, a network interface 110, non-volatile storage112, and a local bus 114 enabling CPU 102 to communicate with the othercomponents. CPU 102 executes an operating system, and various modules,as described below in greater detail. RAM 104 provides relativelyresponsive volatile storage to CPU 102. The input interface 106 enablesan administrator or user to provide input via an input device, forexample a keyboard and mouse. The output interface 108 outputsinformation to output devices, such as a display and/or speakers. Thenetwork interface 110 permits communication with other systems, such asother computing devices and servers. Non-volatile storage 112 stores theoperating system and programs, including computer-executableinstructions for implementing the operating system and modules, as wellas any data used by these services. Additional stored data, as describedbelow, can be stored in a database 116. During operation of the system100, the operating system, the modules, and the related data may beretrieved from the non-volatile storage 112 and placed in RAM 104 tofacilitate execution. In an embodiment, the system 100 further includesa machine learning module 120, an adjustment module 124, and ageneration module 122.

To generate the optimized promotional materials, the system 100 performsan analysis of historical data to find an optimized configuration forthe promotional materials. The historical data can be gathered from avariety of sources. The historical data may include previous promotionalmaterials, and the results of distributing such previous promotionalmaterials. For example, the cost of generating the previous promotionalmaterials, the cost of distributing the previous promotional materials,and product-level results of the previous promotional materials. Theproduct-level results can be, for example, the area on the promotionalmaterials occupied by a certain product, the location in the previouspromotional materials of the certain product, the promotional aspectadvertised with respect to the certain product (such as the sale price),how many units of the product were sold in a selected period afterdistribution of the promotional materials, how much cannibalizationoccurred on other products due to the advertising of the certainproduct, how much halo effect occurred (i.e. effect of promotion on oneitem influencing customers to purchase other items), and the like.

The historical data can be gathered from a variety of sources, forexample, through the interaction with digitized point-of-sale machines,loyalty programs, digital communication channels, databases of previouspromotional materials and various other means. The historical data canbe gathered on a product-by-product basis; or on a basis of promotionalmaterials, whereby in some cases the product level information can beextrapolated. The various product data can be collected, for example,through PoS terminals, recurring billing (in the case of contractualservices), and e-commerce websites of the transactional variety, andthrough other means such as market research, 3rd party aggregation, andother data resellers and brokers. The system 100 draws upon this productdata from the various channels in order to better understand thebehavior of each customer. The system 100 mines this data and appliesstatistical and machine learning approaches to yield recommendations andactions that enable the optimization of promotional materials.

The solution provided by the system 100 is one that allows generation ofpromotional materials that are optimized. The system 100 uses acombination of constrained optimization, prediction, and reinforcementlearning to directly optimize the generation of promotional materials.For example, optimization can result in any output measure, such asrevenue uplift, unit sales uplift, profitability uplift, or somecombination thereof. The system 100 leverages the historical data tomake predictions about what products and promotions should be placed onthe promotional materials to optimize the results of the promotionalmaterials.

Turning to FIG. 8, an exemplary embodiment of the method 50 describedherein is shown in a flowchart for a method 300 for optimizingpromotional materials.

At block 302, the system 100 receives input parameters from a user viathe input interface 106. The input interface 106 can include, forexample, a keyboard, a mouse, a touchscreen, or the like. The inputparameters can include, for example, the date for distribution of thepromotional materials, the desired length of product promotions, thesize of the promotional materials, the length of the promotionalmaterials, the type of distribution of the promotional materials, thegeographic scope of the promotional materials, the desired sales outcomeor margin for a product, products, or the promotional materials as awhole, or the like.

At block 304, the machine learning module 120 selects which products areto be included in the promotional materials based on a machine learningmodel. The selection of products can include determining which products,from a predetermined roster of products, are optimally ready to bepromoted.

At block 306, the machine learning module 120 selects the configurationand layout of the products on the promotional materials based on themachine learning model. In one case, the machine learning module 120determines which products are required to be featured more prominentlyin the promotional materials, and can determine a hierarchy of productprominence. The hierarchy can be determined by, for example, determiningwhich products are required to be sold more readily and which productsare more likely to entice a consumer to read through the promotionalmaterials. The machine learning module 120 can also assign a weight toeach of the products, called a prominence weight, that determines therelative prominence of each product. The prominence weight can benormalized, for example out of 100, such that the machine learningmodule 120 can find relative prominence weight for all the products inthe promotional materials. In a certain case, the prominence weight canbe normalized for each page in a multi-page advertising material.

The machine learning module 120 can then determine the block structureand product layout on the promotional materials based on the machinelearning model. FIG. 9 illustrates an example of a block structure of apage of an advertising material 400. In this example, the topmost blockis a title block 402 and does not include a product, but rather has atitle and/or identifying information about the entity that isdistributing the advertisement material. The remaining blocks, productblocks 404 to 430, contain product related information. The productrelated information can be, for example, a picture of the product, aprice of the product, a quantity limit of the product, noteworthyaspects regarding the product, product code, or the like.

Based on the normalized prominence weight, the machine learning module120 can determine the size of the product blocks 404 to 430 for eachproduct. In the example of FIG. 9, product block 412 includes a producthaving the highest prominence weight, and accordingly, product block 412has the largest area of any of the product blocks 404 to 430. Productblock 420 includes a product with the next greatest prominence weight,and accordingly, product block 420 has the second largest area of any ofthe product blocks 404 to 430. The area of the rest of the productblocks 404 to 410, 414 to 418, and 422 to 430 are likewise determinedbased on the prominence weight of the product included in the productblock.

In the present embodiment, the machine learning module 120 can also makedeterminations regarding the location of each of the product blocks 404to 430 on the advertising material based on the machine learning model.In the example of FIG. 9, the area of the page of the advertisingmaterial can be given different relative weighting. For example, thearea closer to the center of the page can be given higher relativeweight than the peripheral corners of the page. In another example, thearea closer to the top of the page can be given higher relative weightthan the area on the bottom of the page. In the example of FIG. 9, thelocation of product block 412 has the highest location weighting due toits central location, and product block 406 has a higher locationweighting than product block 428 due to its higher location. In furtherexamples, location weighting can be based on statistical considerationsof where on the page a reader's eye is typically going to view and paymore attention.

In some cases, the product blocks 404 to 430 can include two or moreproducts in one product block. In this case, the machine learning module120 can average the prominence weight for the two or more productsincluded in that one product block, and the prominence weights can benormalized accordingly.

At block 308, a generation module 122 generates the promotionalmaterials using the block structure and product layout determined by themachine learning module 120. The promotional materials are thenprintable, sendable, or otherwise available for distribution via theoutput interface 108.

In further embodiments, the machine learning module 120 determines theblock structure and product layout on multiple pages of an advertisingmaterial. For example, giving greater location weighting to pages closerto the beginning of the advertising material, or giving greater locationweighting to the exterior facing pages.

In further embodiments, the machine learning module 120 can also assigna weight to the product blocks 404 to 430 based on conspicuity ofelements of the product blocks. For example, the machine learning module120 can provide a higher conspicuity weighting to product blocks with amore striking background color, such as bright yellow or deep red. Inanother example, the machine learning module 120 can provide a higherconspicuity weighting to product blocks with an eye-catching outline orshape, such as a star-shaped product block. In another example, themachine learning module 120 can provide a higher conspicuity weightingto product blocks with a noticeable text font, such as a bolded productname.

In further embodiments, the machine learning module 120 can provide anoverall weighting taking into consideration at least two of theproperties of the product blocks, the properties including the area, thelocation and the conspicuity of the product blocks. The overallweighting can weigh the properties evenly or unevenly; for example,weighing area as twice as important as location and four times asimportant as conspicuity.

In yet further embodiments, the system 100 can also include theadjustment module 124. At block 310, the adjustment module 124 allows auser to edit the promotional materials generated by the generationmodule 122 via the input interface 106. The editing can include, forexample, removing a product from being displayed in the promotionalmaterials, changing the length of the promotional materials, changingthe prominence weighting of any of the product blocks, changing theproperties of the product blocks including the area, the location andthe conspicuity of the product blocks. The generation module 122 canthen regenerate the promotional materials based on the changes receivedfrom the adjustment module 124. In some cases, the adjustment module 124can keep a log on the database 116 of the changes requested by the user.In subsequent iterations of the system 100, the machine learning module120 can take into consideration previous changes by the user, and weighthe product blocks accordingly. For example, if a user keeps vetoingplacing a certain product in the promotional materials, the machinelearning module 120 will refrain from placing that product on subsequentpromotional materials.

In some cases, the machine learning module 120 can use a machinelearning model that includes a set of data mining and machine learningbuilding blocks, working in conjunction with the other modules, togenerate predictive or explanatory scores of the outcomes of thepromotional materials. The scores can be based on the historical data,as described above, as well as, for example, the predicted sellingquantities of one or more products due to the distribution of thepromotional materials, the predicted profit or revenue generated by thepromotional materials, the return on investment of the promotionalmaterials, the cannibalization of other products based on the advertisedproducts, the pull forward of the products, the halo effected on otherproducts, and the like. Both supervised and unsupervised learningtechniques can be used in generating the scores. These scores are thenused by the machine learning module 120 to configure and layout thepromotional materials, as described herein.

The machine learning techniques used to generate these scores can bespecific to the particular scoring source. There are a number of modelsthat can be utilized by the machine learning module 120, as describedherein.

The machine learning module 120 determines the best mode of achievingthe goals of the marketing campaign specified by the user. Inparticular, the machine learning module 120 takes the inputs andconstraints received via the input interface 106, and the outputtedscores, and combines them to suggest to the machine learning module 120values for the area, the location and the conspicuity of the productblocks. The machine learning module 120 then uses this suggestion toproduce weightings as described above.

In some cases, the machine learning module 120 can be instantiated foreach brand new set of data it receives. Prior promotional materials datacan be imported into the system 100 to enable the machine learningmodule 120 to benefit from prior experience. This instantiation step isa mix of constrained optimization and conditional rules. The reason themachine learning module 120 is instantiated is because it may not have abase of decisions from which to learn what maximizes the objectives. Inthis case, there needs to be a base of matches for the reinforcementlearning capability to have enough data to actually learn.

Beyond instantiation, the steady state of the machine learning module120 shifts over to a reinforcement learning hybrid approach. As furtherdata is collected by the system 100, the building blocks are re-trainedand re-scored, and, as a result, new predictions are made for to beprovided to the machine learning module 120. This reinforcement learningand feedback approach can be invoked repeatedly to further hone thescores. As this process continues, various iterations occur wherebypromotional materials are distributed, outcomes are received and thebuilding blocks are re-trained and re-scored.

The initial weighting system in the instantiation process begins to giveway to a machine intelligent approach to matching the outcomes to theeffect of the promotional materials. The machine learning technique usedprioritizes learning in an environment whose outcomes can be consideredas being partly due to randomness and partly due to phenomena under thecontrol of the system 100. For example, lucrative offers of a product inthe promotional materials have undoubted causal relationship to sales ofthat product, however, there may be other random factors which may havecontributed to those sales.

In some cases, the machine learning model of the machine learning module120 can take into account the seasonality of the products, as part ofthe historical data, in order to make recommendations based on the timeof year of distribution of the promotional materials. As an example, sunscreen advertised in the winter to residents of a northern country wouldnot be optimal. Thus, placing sun screen products in advertisements thatare distributed in the spring and summer are more likely to result inoptimal sales effects. In some cases, a score is developed based on theseasonality of the advertised product for inclusion into the machinelearning decision making, based, for example, on historical sales of theproduct around the date of distribution of the promotional materials oron input from the user.

In some embodiments, the components of the system 10, 100 are stored byand executed on a single computer system. In other embodiments, thecomponents of the system 10, 100 are distributed among two or morecomputer systems that may be locally or globally distributed.

The system 10, 100 can be seeded with historical data from whichinferences can be drawn, enabling reinforcement learning to be employedusing the historical data prior to the collection of further customerdata by the system. That is, where previous outcomes of promotionalmaterials exist, and that data is readily available and interpretable,the instantiation phase can be skipped entirely in favor of implementingthe machine learning.

In some cases, the system 10, 100 can change or re-train the models withwhich the determinations themselves are being calculated.

In some cases, the system 10, 100 can perform reinforcement learning“concurrently” with the receiving of outcome data via various channels,enabling the machine learning module 30, 120 to continue to learn fromoutcomes, and in some cases, learn new weightings.

With enough interaction history, the machine learning module 30, 120, asdescribed herein, can be considered an artificially intelligent agent.

While the embodiments described herein generally refer to analytics ofpromotions by consumer-facing establishments, the described embodimentscan be used for manufacturers, distributors, agencies, serviceproviders, other business-to-business entities, or any otherestablishment or firm which markets its products or services or whichcan provide products or services at a discount.

Although the invention has been described with reference to certainspecific embodiments, various modifications thereof will be apparent tothose skilled in the art without departing from the spirit and scope ofthe invention as outlined in the claims appended hereto. The entiredisclosures of all references recited above are incorporated herein byreference.

1-18. (canceled)
 19. A computer-implemented method for generation of atleast one output analytic for per-store unit demand, the methodcomprising: receiving historical data related to one or more products,the historical data comprising historical inventory level of the one ormore products at a retail store; forecasting, using a demand machinelearning model trained or instantiated with a demand training set, ademand for the one or more products at the retail store, the demandmachine learning model comprising a first model for predicting the totalunit demand for the retail store and a second model for predicting thedemand in the retail store for the one or more products, the demandtraining set comprising the historical data, the forecast comprisingmultiplying the prediction of the total unit demand for the retail storefor a predetermined time-period by the prediction of the demand in theretail store for the one or more products; and outputting the at leastone output analytic to the user.
 20. The method of claim 19, wherein theforecast further comprises adding a covariate for a stock out condition.